object detection network
SupplementaryforMixedSupervisedObject DetectionbyTransferringMaskPriorandSemantic Similarity
Our ablation studies (Table3in the main paper) havealready proved the advantage of mask prior. From Figure 2, we can see that the coarse masks indicate the rough locations of objects which can help the object detection network predicttheboundingboxes. Tovalidate the transferability ofour similarity transfer,we evaluate our similarity network trained on COCO-60 trainval set. Wetreat the similarity prediction task as abinary classification task, in which the binary label 1 (resp., 0) means that two bounding boxes belong to the same category (resp.,different The precision, recall and F1 scores are summarized in Table 1. We observe that the gap between the performance of similarity network on base categories and novel categories is negligible (e.g., F1 Scores 84.9% v.s.
MonoUNI: A Unified Vehicle and Infrastructure-side Monocular 3D Object Detection Network with Sufficient Depth Clues
Monocular 3D detection of vehicle and infrastructure sides are two important topics in autonomous driving. Due to diverse sensor installations and focal lengths, researchers are faced with the challenge of constructing algorithms for the two topics based on different prior knowledge. In this paper, by taking into account the diversity of pitch angles and focal lengths, we propose a unified optimization target named normalized depth, which realizes the unification of 3D detection problems for the two sides. Furthermore, to enhance the accuracy of monocular 3D detection, 3D normalized cube depth of obstacle is developed to promote the learning of depth information. We posit that the richness of depth clues is a pivotal factor impacting the detection performance on both the vehicle and infrastructure sides. A richer set of depth clues facilitates the model to learn better spatial knowledge, and the 3D normalized cube depth offers sufficient depth clues. Extensive experiments demonstrate the effectiveness of our approach. Without introducing any extra information, our method, named MonoUNI, achieves state-of-the-art performance on five widely used monocular 3D detection benchmarks, including Rope3D and DAIR-V2X-I for the infrastructure side, KITTI and Waymo for the vehicle side, and nuScenes for the cross-dataset evaluation.
Supplementary for Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity Y an Liu
In this supplementary material, we will provide more analyses of mask prior in Section 1 and similarity transfer in Section 2. We will show the visualization results in Section 3 and the performance variance Figure 2) to better investigate the effectiveness of mask prior. We treat the similarity prediction task as a binary classification task, in which the binary label 1 ( resp. For VOC test set, we repeat the above procedure. The precision, recall and F1 scores are summarized in Table 1. We observe that the gap between the performance of similarity network on base categories and novel categories is negligible ( e.g., F1 Scores " person " and calculate their pairwise semantic similarity scores by applying the trained similarity It is well worth noting that the average similarity score will be affected slightly by the number of outliers if batch size increases to a large scale ( e.g., 64).
MonoUNI: A Unified Vehicle and Infrastructure-side Monocular 3D Object Detection Network with Sufficient Depth Clues
Monocular 3D detection of vehicle and infrastructure sides are two important topics in autonomous driving. Due to diverse sensor installations and focal lengths, researchers are faced with the challenge of constructing algorithms for the two topics based on different prior knowledge. In this paper, by taking into account the diversity of pitch angles and focal lengths, we propose a unified optimization target named normalized depth, which realizes the unification of 3D detection problems for the two sides. Furthermore, to enhance the accuracy of monocular 3D detection, 3D normalized cube depth of obstacle is developed to promote the learning of depth information. We posit that the richness of depth clues is a pivotal factor impacting the detection performance on both the vehicle and infrastructure sides. A richer set of depth clues facilitates the model to learn better spatial knowledge, and the 3D normalized cube depth offers sufficient depth clues.
A Fourier-enhanced multi-modal 3D small object optical mark recognition and positioning method for percutaneous abdominal puncture surgical navigation
Guo, Zezhao, Guo, Yanzhong, Zhao, Zhanfang
Navigation for thoracoabdominal puncture surgery is used to locate the needle entry point on the patient's body surface. The traditional reflective ball navigation method is difficult to position the needle entry point on the soft, irregular, smooth chest and abdomen. Due to the lack of clear characteristic points on the body surface using structured light technology, it is difficult to identify and locate arbitrary needle insertion points. Based on the high stability and high accuracy requirements of surgical navigation, this paper proposed a novel method, a muti-modal 3D small object medical marker detection method, which identifies the center of a small single ring as the needle insertion point. Moreover, this novel method leverages Fourier transform enhancement technology to augment the dataset, enrich image details, and enhance the network's capability. The method extracts the Region of Interest (ROI) of the feature image from both enhanced and original images, followed by generating a mask map. Subsequently, the point cloud of the ROI from the depth map is obtained through the registration of ROI point cloud contour fitting. In addition, this method employs Tukey loss for optimal precision. The experimental results show this novel method proposed in this paper not only achieves high-precision and high-stability positioning, but also enables the positioning of any needle insertion point.
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Transformer based Multitask Learning for Image Captioning and Object Detection
Basak, Debolena, Srijith, P. K., Desarkar, Maunendra Sankar
In several real-world scenarios like autonomous navigation and mobility, to obtain a better visual understanding of the surroundings, image captioning and object detection play a crucial role. This work introduces a novel multitask learning framework that combines image captioning and object detection into a joint model. We propose TICOD, Transformer-based Image Captioning and Object Detection model for jointly training both tasks by combining the losses obtained from image captioning and object detection networks. By leveraging joint training, the model benefits from the complementary information shared between the two tasks, leading to improved performance for image captioning. Our approach utilizes a transformer-based architecture that enables end-to-end network integration for image captioning and object detection and performs both tasks jointly. We evaluate the effectiveness of our approach through comprehensive experiments on the MS-COCO dataset. Our model outperforms the baselines from image captioning literature by achieving a 3.65% improvement in BERTScore.
Improving Warped Planar Object Detection Network For Automatic License Plate Recognition
Tra, Nguyen Dinh, Tri, Nguyen Cong, Hung, Phan Duy
This paper aims to improve the Warping Planer Object Detection Network (WPOD-Net) using feature engineering to increase accuracy. What problems are solved using the Warping Object Detection Network using feature engineering? More specifically, we think that it makes sense to add knowledge about edges in the image to enhance the information for determining the license plate contour of the original WPOD-Net model. The Sobel filter has been selected experimentally and acts as a Convolutional Neural Network layer, the edge information is combined with the old information of the original network to create the final embedding vector. The proposed model was compared with the original model on a set of data that we collected for evaluation. The results are evaluated through the Quadrilateral Intersection over Union value and demonstrate that the model has a significant improvement in performance.
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- Asia > Vietnam > Hanoi > Hanoi (0.04)
Building the world's largest outdoor AI artwork
The idea was straightforward: create an interactive artwork that highlights Baden-Württemberg and it's Cyber Valley Initiative as the epicenter for artificial intelligence on our continent. To do this properly, we had to think big and act fast, as the exhibition would already start in a mere five weeks. Too tempting to build something monumental to be set out in the public space. Put simply, our plan involved capturing the scene at the front, feeding it to a supercomputer and teleporting the results into a gigantic display. Thereby, the work shows an altered mirror image of reality, all created by a live-dreaming AI. We wanted the observer to dive in and become part of an art piece that is generated by machine intelligence and live-streamed to the internet.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.06)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.06)
Evolution Of Object Detection Networks - YouTube
Intuition lectures on topics ranging from Classical CV techniques like HOG, SIFT to Convolutional Neural Network based techniques like Overfeat, Faster RCNN etc. You will learn how the ideas have evolved from some of the earliest papers to current ones. Intuition lectures on topics ranging from Classical CV techniques like HOG, SIFT to Convolutional Neural Network based techniques like Overfeat, Faster RCNN etc. You will learn how the ideas hav... more